MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes
نویسندگان
چکیده
Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known 'full-length' elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons.
منابع مشابه
LTR_STRUC: a novel search and identification program for LTR retrotransposons
MOTIVATION Long terminal repeat (LTR) retrotransposons constitute a substantial fraction of most eukaryotic genomes and are believed to have a significant impact on genome structure and function. Conventional methods used to search for LTR retrotransposons in genome databases are labor intensive. We present an efficient, reliable and automated method to identify and analyze members of this impo...
متن کاملDistribution and phylogeny of Penelope-like elements in eukaryotes.
Penelope-like elements (PLEs) are a relatively little studied class of eukaryotic retroelements, distinguished by the presence of the GIY-YIG endonuclease domain, the ability of some representatives to retain introns, and the similarity of PLE-encoded reverse transcriptases to telomerases. Although these retrotransposons are abundant in many animal genomes, the reverse transcriptase moiety can ...
متن کاملPhylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses.
We have conducted a phylogenetic analysis of the Ribonuclease HI (RNH) domains present in Eubacteria, Eukarya, all long-term repeat (LTR)-bearing retrotransposons, and several late-branching clades of non-LTR retrotransposons. Analysis of this simple yet highly conserved enzymatic domain from these disparate sources provides surprising insights into the evolution of eukaryotic retrotransposons....
متن کاملRecent expansion of a new Ingi-related clade of Vingi non-LTR retrotransposons in hedgehogs.
Autonomous non-long terminal repeat (non-LTR) retrotransposons and their repetitive remnants are ubiquitous components of mammalian genomes. Recently, we identified non-LTR retrotransposon families, Ingi-1_AAl and Ingi-1_EE, in two hedgehog genomes. Here we rename them to Vingi-1_AAl and Vingi-1_EE and report a new clade "Vingi," which is a sister clade of Ingi that lacks the ribonuclease H dom...
متن کاملFine-grained annotation and classification of de novo predicted LTR retrotransposons
Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such can...
متن کامل